Why AI That Teaches Itself to Achieve a Goal Is the Next Big Thing
- by 7wData
What’s the difference between the creative power of game-playing AIs and the predictive AIs most companies seem to use? How they learn. The AIs that thrive at games like Go, creating never before seen strategies, use an approach called reinforcement learning — a mature machine learning technology that’s good at optimizing tasks in which an agent takes a series of actions over time, where each action is informed by the outcome of the previous ones, and where you can’t find a “right” answer the way you can with a prediction. It’s a powerful technology, but most companies don’t know how or when to apply it. The authors argue that reinforcement learning algorithms are good at automating and optimizing in situations dynamic situations with nuances that would be too hard to describe with formulas and rules.
Lee Sedol, a world-class Go Champion, was flummoxed by the 37 move Deepmind’s AlphaGo made in the second match of the famous 2016 series. So flummoxed that it took him nearly 15 minutes to formulate a response. The move was strange to other experienced Go players as well, with one commentator suggesting it was a mistake. In fact, it was a canonical example of an Artificial Intelligence algorithm learning something that seemed to go beyond just pattern recognition in data — learning something strategic and even creative. Indeed, beyond just feeding the algorithm past examples of Go champions playing games, Deepmind developers trained AlphaGo by having it play many millions of matches against itself. During these matches, the system had the chance to explore new moves and strategies, and then evaluate if they improved performance. Through all this trial and error, it discovered a way to play the game that surprised even the best players in the world.
If this kind of AI with creative capabilities seems different than the chatbots and predictive models most businesses end up with when they apply machine learning, that’s because it is. Instead of machine learning that uses historical data to generate predictions, game-playing systems like AlphaGo use reinforcement learning — a mature machine learning technology that’s good at optimizing tasks. To do so, an agent takes a series of actions over time, and each action is informed by the outcome of the previous ones. Put simply, it works by trying different approaches and latching onto — reinforcing — the ones that seem to work better than the others. With enough trials, you can reinforce your way to beating your current best approach and discover a new best way to accomplish your task.
Despite its demonstrated usefulness, however, reinforcement learning is mostly used in academia and niche areas like video games and robotics. Companies such as Netflix, Spotify, and Google have started using it, but most businesses lag behind. Yet opportunities are everywhere. In fact, any time you have to make decisions in sequence — what AI practitioners call sequential decision tasks — there a chance to deploy reinforcement learning.
Consider the many real-world problems that require deciding how to act over time, where there is something to maximize (or minimize), and where you’re never explicitly given the correct solution. For example:
If you’re a company leader, there are likely many processes you’d like to automate or optimize, but that are too dynamic or have too many exceptions and edge cases, to program into software. Through trial and error, reinforcement learning algorithms can learn to solve even the most dynamic optimization problems — opening up new avenues for automation and personalization in quickly changing environments.
Many businesses think of machine learning systems as “prediction machines” and apply algorithms to forecast things like cash flow or customer attrition based on data such as transaction patterns or website analytics behavior. These systems tend to use what’s called supervised machine learning.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More